What common traits, if any, do award winning songs contain? Can we look at intrinsic traits of songs, combined with metrics defined by Spotify, to determine award winning musical features? In this paper, we break down our data collection, data processing, and data analysis of a dataset of roughly 1,000 popular songs, both award-winning and not.
JOAN TO DO: Write how we chose the 1,000 songs originally; how we ended up with 867; and which Grammy winners we stuck with (only Grammy winners, or also Grammy nominees?)
# Set the working directory to this file's folder
library("rstudioapi")
setwd(dirname(getActiveDocumentContext()$path))
load("final_df_n_str.RData")
Sys.setenv(LANG = "en")
# Load necessary libraries
library(pROC)
library(MASS)
library(ROSE)
library(confintr)
library(ggplot2)
library(correlation)
library(corrplot)
library(class)
library(caret)
library(glmnet)
# Selecting the relevant variables
data = final_df_n_str
data = data[,c("track_name", "artist_name", "IsWinner", "Year","year",
"followers", "acousticness", "danceability", "duration_ms",
"energy", "instrumentalness", "key", "liveness", "loudness",
"mode", "tempo", "time_signature", "valence")]
# Merge the two year variable
data$Year[data$Year == "Undefined"] <- data$year[data$Year == "Undefined"]
data = data[,c("track_name","artist_name", "IsWinner", "Year", "followers",
"acousticness", "danceability", "duration_ms",
"energy", "instrumentalness", "key", "liveness", "loudness",
"mode", "tempo", "time_signature", "valence")]
# Eliminating duplicates
data$track_name == "Closing Time"
data$track_name == "Smells Like Teen Spirit"
data$track_name == "Don't Wanna Fight"
data[914, ]
data[789,]
data[669,]
data = data[-c(669, 789, 914),]
sum(data$Year < 1992)
nrow(data)
data = data[!data$Year < 1992,]
# Creating row names
names = paste0(data$track_name, " - ", data$artist_name)
# Eliminating unusable variables
data = data[,c("IsWinner", "Year", "followers", "acousticness",
"danceability", "duration_ms", "energy",
"instrumentalness", "key", "liveness", "loudness", "mode",
"tempo", "time_signature", "valence")]
data = cbind(names = names, data)
# Casting variables
data$IsWinner[data$IsWinner == "Winner"] = 1
data$IsWinner[data$IsWinner == "Nominee"] = 1
data$IsWinner[data$IsWinner == "Nothing"] = 0
data$IsWinner = as.integer(data$IsWinner)
data$Year = as.integer(data$Year)
data$mode = as.factor(data$mode)
data$key = as.factor(data$key)
data$time_signature = as.factor(data$time_signature)
# Giving row names
summary(data)
summary(data$IsWinner)
In order to perform analysis of songs, we decided to use metrics that are intrinsic to music as well as artificial metrics created and measured by the music streaming giant Spotify. The intrinsic metrics we used were: duration, musical key, modality (major or minor key), tempo, time signature, and genre. Spotify also uses what they call “audio features” (in the table below) to perform their own analysis of songs when creating playlists, suggesting music, etc. We used these professionally manufactured metrics to bolster the intrinsic metrics and increase our insight into what might make a song award-winning.
| Audio Feature | Definition |
|---|---|
| Acousticness | A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. |
| Danceability | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
| Energy | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| Instrumentalness | Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. |
| Liveness | Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. |
| Loudness | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. |
| Speechiness | Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. |
| Valence | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
Although Spotify does not openly share how they determine these metrics, we found them suitable to assist in our analysis.
As a final processing step, we split the data into training and test datasets. The training dataset contains 80% of the original dataset, and the remaining 20% of the data in the test dataset will be used to test tour model against after we have trained it. It is very important to test the model on never-before-seen data to determine not only how well the model performs, but also how well the model can generalize.
# Splitting training and test set
training_size = floor(0.8 * nrow(data))
set.seed(42)
train_ind = sample(seq_len(nrow(data)), size = training_size)
training_set = data[train_ind,]
test_set = data[-train_ind,]
summary(training_set)
## names IsWinner Year followers
## Length:693 Min. :0.0000 Min. :1992 Min. : 2597
## Class :character 1st Qu.:0.0000 1st Qu.:2001 1st Qu.: 868777
## Mode :character Median :0.0000 Median :2010 Median : 2350118
## Mean :0.1876 Mean :2009 Mean : 4338356
## 3rd Qu.:0.0000 3rd Qu.:2018 3rd Qu.: 5615666
## Max. :1.0000 Max. :2023 Max. :44692754
##
## acousticness danceability duration_ms energy
## Min. :0.0000032 Min. :0.130 Min. : 78591 Min. :0.0975
## 1st Qu.:0.0016900 1st Qu.:0.419 1st Qu.: 206413 1st Qu.:0.6040
## Median :0.0278000 Median :0.522 Median : 237800 Median :0.7570
## Mean :0.1553733 Mean :0.512 Mean : 251635 Mean :0.7182
## 3rd Qu.:0.2050000 3rd Qu.:0.607 3rd Qu.: 278267 3rd Qu.:0.8820
## Max. :0.9880000 Max. :0.894 Max. :1355938 Max. :0.9960
##
## instrumentalness key liveness loudness mode
## Min. :0.00e+00 9 : 95 Min. :0.0157 Min. :-18.148 0:203
## 1st Qu.:4.90e-06 2 : 94 1st Qu.:0.0989 1st Qu.: -8.086 1:490
## Median :3.21e-04 7 : 84 Median :0.1240 Median : -6.253
## Mean :6.25e-02 0 : 81 Mean :0.2004 Mean : -6.645
## 3rd Qu.:1.49e-02 11 : 68 3rd Qu.:0.2320 3rd Qu.: -4.767
## Max. :8.95e-01 4 : 58 Max. :0.9980 Max. : -1.574
## (Other):213
## tempo time_signature valence
## Min. : 48.58 1: 2 Min. :0.0494
## 1st Qu.: 99.19 3: 37 1st Qu.:0.3050
## Median :121.14 4:649 Median :0.4640
## Mean :123.28 5: 5 Mean :0.4725
## 3rd Qu.:141.93 3rd Qu.:0.6310
## Max. :205.85 Max. :0.9730
##
# Checking if the ratio is preserved
sum(data$IsWinner == 1)/ sum(data$IsWinner == 0)
## [1] 0.2159888
sum(training_set$IsWinner == 1)/ sum(training_set$IsWinner == 0)
## [1] 0.2309059
training_set
## # A tibble: 693 × 16
## names IsWinner Year followers acousticness danceability duration_ms energy
## <chr> <int> <int> <int> <dbl> <dbl> <int> <dbl>
## 1 Nightm… 0 2010 6262809 0.000318 0.554 374453 0.949
## 2 I'd Do… 0 1993 1034322 0.465 0.366 718600 0.561
## 3 Patien… 1 2022 4802169 0.000195 0.318 441402 0.87
## 4 Someda… 1 2006 6137375 0.254 0.533 295560 0.59
## 5 I Know… 0 2020 1775452 0.33 0.323 344693 0.323
## 6 Find M… 1 2021 4416749 0.256 0.873 293849 0.809
## 7 Weak -… 0 2017 2923531 0.118 0.67 201159 0.643
## 8 Walk O… 1 2001 11148674 0.00379 0.528 296240 0.832
## 9 Black … 1 2018 2341237 0.197 0.558 259893 0.902
## 10 Spectr… 0 2012 6399322 0.00225 0.578 218190 0.946
## # ℹ 683 more rows
## # ℹ 8 more variables: instrumentalness <dbl>, key <fct>, liveness <dbl>,
## # loudness <dbl>, mode <fct>, tempo <dbl>, time_signature <fct>,
## # valence <dbl>
At first, we took a look at the continuous variables.
attach(training_set)
## The following object is masked _by_ .GlobalEnv:
##
## names
# Correlations between continuous variables
cor_matrix = cor(training_set[,c(-1, -2, -10, -13, -15)])
corrplot(cor_matrix)
pairs(training_set[,c(-1, -2, -10, -13, -15)], lower.panel = panel.smooth)
WHY CAN’T I GET THIS PAIRS() PDF TO INSERT?? Maybe it is inserted, but too big??
knitr::include_graphics("yourPlot.pdf", error = FALSE)
We looked at the association measure for categorical variables utilizing Cramer’s V, which is a normalized version of the chi-square statistic.
(CRISTIAN IS THIS CORRECT? EXPAND?)
# Association measure for categorical variables (Cramer's V is a normalized
# version of the chi-square statistics)
cramersv(matrix(c(as.numeric(key), as.numeric(mode)), ncol = 2))
## [1] 0.3275984
cramersv(matrix(c(as.numeric(key), as.numeric(time_signature)), ncol = 2))
## [1] 0.305952
cramersv(matrix(c(as.numeric(mode), as.numeric(time_signature)), ncol = 2))
## [1] 0.1425218
Next, we looked for associations between each of the categorical variables (Key, Mode, and Time Signature) and all of the continuous variables. Some of these were significant, meaning… (ASK CRISTIAN ABOUT THIS)
# Association between continuous and categorical variables
# Key
fol_key.aov <- aov(followers ~ key)
summary(fol_key.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 9.557e+14 8.688e+13 2.129 0.0167 *
## Residuals 681 2.780e+16 4.082e+13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aco_key.aov <- aov(acousticness ~ key)
summary(aco_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.76 0.06888 1.154 0.316
## Residuals 681 40.65 0.05969
dan_key.aov <- aov(danceability ~ key)
summary(dan_key.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.368 0.03343 1.911 0.0351 *
## Residuals 681 11.913 0.01749
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dur_key.aov <- aov(duration_ms ~ key)
summary(dur_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 1.302e+11 1.184e+10 1.671 0.0758 .
## Residuals 681 4.825e+12 7.086e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ene_key.aov <- aov(energy ~ key)
summary(ene_key.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.806 0.07330 1.926 0.0334 *
## Residuals 681 25.922 0.03806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ins_key.aov <- aov(instrumentalness ~ key)
summary(ins_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.503 0.04574 1.696 0.0701 .
## Residuals 681 18.362 0.02696
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
liv_key.aov <- aov(liveness ~ key)
summary(liv_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.437 0.03974 1.184 0.294
## Residuals 681 22.859 0.03357
loud_key.aov <- aov(loudness ~ key)
summary(loud_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 109 9.871 1.467 0.139
## Residuals 681 4583 6.730
tem_key.aov <- aov(tempo ~ key)
summary(tem_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 12755 1159.5 1.384 0.176
## Residuals 681 570619 837.9
val_key.aov <- aov(valence ~ key)
summary(val_key.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## key 11 0.77 0.07024 1.486 0.132
## Residuals 681 32.18 0.04725
# Mode
fol_mode.aov <- aov(followers ~ mode)
summary(fol_mode.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 2.630e+14 2.630e+14 6.38 0.0118 *
## Residuals 691 2.849e+16 4.123e+13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aco_mode.aov <- aov(acousticness ~ mode)
summary(aco_mode.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.32 0.3200 5.382 0.0206 *
## Residuals 691 41.09 0.0595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dan_mode.aov <- aov(danceability ~ mode)
summary(dan_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.012 0.01171 0.66 0.417
## Residuals 691 12.269 0.01775
dur_mode.aov <- aov(duration_ms ~ mode)
summary(dur_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 9.401e+07 9.401e+07 0.013 0.909
## Residuals 691 4.955e+12 7.171e+09
ene_mode.aov <- aov(energy ~ mode)
summary(ene_mode.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.374 0.3737 9.798 0.00182 **
## Residuals 691 26.354 0.0381
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ins_mode.aov <- aov(instrumentalness ~ mode)
summary(ins_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.017 0.01729 0.634 0.426
## Residuals 691 18.848 0.02728
liv_mode.aov <- aov(liveness ~ mode)
summary(liv_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.038 0.03803 1.13 0.288
## Residuals 691 23.258 0.03366
loud_mode.aov <- aov(loudness ~ mode)
summary(loud_mode.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 59 59.19 8.828 0.00307 **
## Residuals 691 4633 6.70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tem_mode.aov <- aov(tempo ~ mode)
summary(tem_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 928 928.2 1.101 0.294
## Residuals 691 582445 842.9
val_mode.aov <- aov(valence ~ mode)
summary(val_mode.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## mode 1 0.00 0.00049 0.01 0.92
## Residuals 691 32.95 0.04769
# Time signature
fol_time.aov <- aov(followers ~ time_signature)
summary(fol_time.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 1.400e+13 4.667e+12 0.112 0.953
## Residuals 689 2.874e+16 4.171e+13
aco_time.aov <- aov(acousticness ~ time_signature)
summary(aco_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 1.27 0.4238 7.275 8.28e-05 ***
## Residuals 689 40.14 0.0583
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dan_time.aov <- aov(danceability ~ time_signature)
summary(dan_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 0.239 0.07964 4.557 0.00359 **
## Residuals 689 12.042 0.01748
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dur_time.aov <- aov(duration_ms ~ time_signature)
summary(dur_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 1.156e+11 3.855e+10 5.488 0.000993 ***
## Residuals 689 4.840e+12 7.024e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ene_time.aov <- aov(energy ~ time_signature)
summary(ene_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 0.74 0.24678 6.543 0.000229 ***
## Residuals 689 25.99 0.03772
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ins_time.aov <- aov(instrumentalness ~ time_signature)
summary(ins_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 0.34 0.11337 4.217 0.00573 **
## Residuals 689 18.52 0.02689
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
liv_time.aov <- aov(liveness ~ time_signature)
summary(liv_time.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 0.021 0.00686 0.203 0.894
## Residuals 689 23.276 0.03378
loud_time.aov <- aov(loudness ~ time_signature)
summary(loud_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 78 25.887 3.865 0.00927 **
## Residuals 689 4614 6.697
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tem_time.aov <- aov(tempo ~ time_signature)
summary(tem_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 7794 2598.1 3.11 0.0259 *
## Residuals 689 575579 835.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
val_time.aov <- aov(valence ~ time_signature)
summary(val_time.aov) # SIGNIFICANT
## Df Sum Sq Mean Sq F value Pr(>F)
## time_signature 3 0.52 0.17279 3.671 0.0121 *
## Residuals 689 32.43 0.04707
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The fact that x, y, z, etc. were significant leads us to etc. etc. etc. CHAT WITH CRISTIAN ABOUT THIS TOO.
TALK ABOUT PARTIAL CORRELAtIONS
# Partial correlations
correlation(training_set[,c(-1, -2, -10, -13, -15)], partial = TRUE)
## # Correlation Matrix (pearson-method)
##
## Parameter1 | Parameter2 | r | 95% CI | t(691) | p
## -------------------------------------------------------------------------------------
## Year | followers | -0.09 | [-0.16, -0.02] | -2.37 | 0.580
## Year | acousticness | 0.15 | [ 0.07, 0.22] | 3.94 | 0.004**
## Year | danceability | 0.14 | [ 0.07, 0.21] | 3.77 | 0.007**
## Year | duration_ms | -0.08 | [-0.15, 0.00] | -2.07 | > .999
## Year | energy | -0.03 | [-0.10, 0.05] | -0.74 | > .999
## Year | instrumentalness | 0.13 | [ 0.05, 0.20] | 3.33 | 0.035*
## Year | liveness | -8.88e-03 | [-0.08, 0.07] | -0.23 | > .999
## Year | loudness | 0.24 | [ 0.17, 0.31] | 6.58 | < .001***
## Year | tempo | 0.10 | [ 0.02, 0.17] | 2.58 | 0.340
## Year | valence | -0.12 | [-0.19, -0.05] | -3.18 | 0.057
## followers | acousticness | -0.04 | [-0.11, 0.04] | -1.03 | > .999
## followers | danceability | -7.23e-03 | [-0.08, 0.07] | -0.19 | > .999
## followers | duration_ms | 0.03 | [-0.05, 0.10] | 0.74 | > .999
## followers | energy | -0.06 | [-0.14, 0.01] | -1.71 | > .999
## followers | instrumentalness | -0.06 | [-0.14, 0.01] | -1.65 | > .999
## followers | liveness | -0.06 | [-0.13, 0.02] | -1.47 | > .999
## followers | loudness | 0.11 | [ 0.04, 0.18] | 2.89 | 0.144
## followers | tempo | 0.01 | [-0.06, 0.09] | 0.33 | > .999
## followers | valence | -0.09 | [-0.16, -0.01] | -2.28 | 0.687
## acousticness | danceability | -0.03 | [-0.10, 0.05] | -0.72 | > .999
## acousticness | duration_ms | -0.07 | [-0.15, 0.00] | -1.92 | > .999
## acousticness | energy | -0.45 | [-0.51, -0.39] | -13.31 | < .001***
## acousticness | instrumentalness | 0.03 | [-0.05, 0.10] | 0.67 | > .999
## acousticness | liveness | 0.09 | [ 0.01, 0.16] | 2.35 | 0.593
## acousticness | loudness | -0.13 | [-0.20, -0.05] | -3.40 | 0.028*
## acousticness | tempo | 0.04 | [-0.04, 0.11] | 0.99 | > .999
## acousticness | valence | 0.07 | [-0.01, 0.14] | 1.79 | > .999
## danceability | duration_ms | -0.10 | [-0.18, -0.03] | -2.75 | 0.215
## danceability | energy | -0.15 | [-0.22, -0.08] | -3.99 | 0.003**
## danceability | instrumentalness | 0.04 | [-0.03, 0.11] | 1.06 | > .999
## danceability | liveness | -0.13 | [-0.20, -0.05] | -3.36 | 0.032*
## danceability | loudness | -0.01 | [-0.09, 0.06] | -0.33 | > .999
## danceability | tempo | -0.31 | [-0.37, -0.24] | -8.50 | < .001***
## danceability | valence | 0.53 | [ 0.47, 0.58] | 16.23 | < .001***
## duration_ms | energy | 0.06 | [-0.02, 0.13] | 1.49 | > .999
## duration_ms | instrumentalness | 0.16 | [ 0.09, 0.23] | 4.35 | < .001***
## duration_ms | liveness | -0.04 | [-0.11, 0.04] | -0.98 | > .999
## duration_ms | loudness | -0.08 | [-0.16, -0.01] | -2.22 | 0.771
## duration_ms | tempo | 0.01 | [-0.06, 0.09] | 0.30 | > .999
## duration_ms | valence | -0.15 | [-0.22, -0.07] | -3.86 | 0.005**
## energy | instrumentalness | 0.16 | [ 0.09, 0.23] | 4.30 | < .001***
## energy | liveness | 0.15 | [ 0.08, 0.23] | 4.11 | 0.002**
## energy | loudness | 0.62 | [ 0.57, 0.66] | 20.64 | < .001***
## energy | tempo | 0.04 | [-0.03, 0.12] | 1.10 | > .999
## energy | valence | 0.26 | [ 0.18, 0.32] | 6.95 | < .001***
## instrumentalness | liveness | -0.06 | [-0.14, 0.01] | -1.67 | > .999
## instrumentalness | loudness | -0.18 | [-0.25, -0.11] | -4.85 | < .001***
## instrumentalness | tempo | 0.06 | [-0.01, 0.14] | 1.70 | > .999
## instrumentalness | valence | -0.09 | [-0.16, -0.02] | -2.40 | 0.554
## liveness | loudness | -0.08 | [-0.15, 0.00] | -2.02 | > .999
## liveness | tempo | -0.05 | [-0.13, 0.02] | -1.38 | > .999
## liveness | valence | 0.01 | [-0.06, 0.09] | 0.29 | > .999
## loudness | tempo | -0.03 | [-0.10, 0.05] | -0.70 | > .999
## loudness | valence | 7.25e-04 | [-0.07, 0.08] | 0.02 | > .999
## tempo | valence | 0.16 | [ 0.09, 0.23] | 4.33 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 693
# Plots of variables with the largest partial correlation
ggplot(data = training_set, aes(danceability, valence)) + geom_jitter(color = "blue")
ggplot(data = training_set, aes(loudness, energy)) + geom_jitter(color = "blue")
ggplot(data = training_set, aes(acousticness, energy)) + geom_jitter(color = "blue")
#Weird song veeeeeeeeeeeeeeeeeeeeeeeery long
which.max(data$duration_ms)
## [1] 448
data[504, ]
## # A tibble: 1 × 16
## names IsWinner Year followers acousticness danceability duration_ms energy
## <chr> <int> <int> <int> <dbl> <dbl> <int> <dbl>
## 1 Drops -… 0 2014 886702 0.853 0.703 173627 0.237
## # ℹ 8 more variables: instrumentalness <dbl>, key <fct>, liveness <dbl>,
## # loudness <dbl>, mode <fct>, tempo <dbl>, time_signature <fct>,
## # valence <dbl>
TALK ABOUT WHY SOME MAY BE SIGNIFICANT; WHAT THAT MEANS.
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.